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Abstract 

While much current web privacy research focuses on browser 
fingerprinting, the boring fact is that the majority of cur- 
rent third-party web tracking is conducted using traditional, 
persistent-state identifiers. One possible explanation for the 
privacy community’s focus on fingerprinting is that to date 
browsers have faced a lose-lose dilemma when dealing 
with third-party stateful identifiers: block state in third-party 
frames and break a significant number of webpages, or allow 
state in third-party frames and enable pervasive tracking. The 
alternative, middle-ground solutions that have been deployed 
all trade privacy for compatibility, rely on manually curated 
lists, or depend on the user to manage state and state-access 
themselves. 

This work furthers privacy on the web by presenting a 
novel system for managing the lifetime of third-party storage, 
“page-length storage”. We compare page-length storage to ex- 
isting approaches for managing third-party state and find that 
page-length storage has the privacy protections of the most 
restrictive current option (i.e., blocking third-party storage) 
but web-compatibility properties mostly similar to the least 
restrictive option (i.e., allowing all third-party storage). This 
work further compares page-length storage to an alternative 
third-party storage partitioning scheme inspired by elements 
of Safari’s tracking protections and finds that page-length stor- 
age provides superior privacy protections with comparable 
web-compatibility. 

We provide a dataset of the privacy and compatibility be- 
haviors observed when applying the compared third-party 
storage strategies on a crawl of the Tranco 1k and the quan- 
titative metrics used to demonstrate that page-length storage 
matches or surpasses existing approaches. Finally, we provide 
an open-source implementation of our page-length storage 
approach, implemented as patches against Chromium. 


1 Introduction 


Web trackers use a variety of techniques to track and vio- 
late privacy on the web. Tracking is usually done through a 
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mix of stateful tracking (i.e., storing and transmitting unique 
identifiers in the browser) and stateless tracking, or finger- 
printing (i.e., attempting to uniquely identify a browser based 
on unique configuration and execution environment charac- 
teristics). 

Though much recent privacy work has focused on stateless, 
fingerprinting-based tracking, we expect that the majority 
of tracking is still done using traditional stateful methods. 
This intuition is based on multiple factors, such as adtech 
uproar over Google’s recent announcement [2] to stop sending 
cookies (only one of many ways of storing identifiers) to 
third-parties in the future, prior research demonstrating the 
popularity of storage-based tracking [22, 24, 41,42, 49,52], 
and expert insight from browser developers. 

While the privacy community has had some success in 
designing defenses to stateless, fingerprinting tracking that 
protect users without breaking benign, user-serving page func- 
tionality [31,37], researchers, industry and activists have been 
less successful in designing practical, robust defenses against 
webscale stateful third-party tracking. 

Despite the press and attention that the “end of third-party 
cookies” has received, blocking third-party cookies (i.e., not 
sending cookies on requests for third-party sub-resources) 
does not provide any fundamental protections against stateful 
third-party tracking. Blocking third-party cookies is a posi- 
tive step for web privacy, but because it prevents categories 
of accidental tracking or information disclosure, not because 
it prevents intentional tracking. Third-party frames can ac- 
cess the same cookies!, localStorage’, indexDB?, or other 
JavaScript accessible storage methods (sometimes collectively 
called "DOM Storage"). In short, blocking third-party cook- 
ies is a necessary, but insufficient part of solving the general 


lFor completeness, we note that this isn’t completely true, and that 
HttpOnly cookies cannot be accessed from JavaScript. But since HttpOnly 
doesn’t provide protection against intentional tracking (since such track- 
ers could just omit the HttpOnly instruction), we don’t consider HttpOnly 
further in this work, and omit it from further discussion for concision. 

2https://html.spec.whatwg.org/multipage/webstorage.html# 
the-localstorage-attribute 

https: //www.w3.org/TR/IndexedDB-2/ 


problem of preventing stateful third-party tracking. 

Though some browser vendors have taken some steps to 
address third-party stateful tracking, each approach has sig- 
nificant shortcomings and limitations. The details of each 
technique are described in Section 2.3, but at a high level, 
deployed approaches are incomplete and insufficient, either 
because they depend on curated lists and heuristics (i.e., Fire- 
fox and Edge), address tracking across sites but not time (i.e., 
Safari), defer the question to non-expert users (i.e., Storage 
Access API"), or provide strong protections against tracking 
but break sites for users (i.e., Brave). 

We argue that practical, robust protections against stateful, 
third-party tracking should have at least three properties. 


1. Cross-site protection: prevent third-parties from using 
stored identifiers to link browsing behavior across first- 
party sites. 


2. Cross-time protection: prevent third-parties from using 
stored identifiers to link browsing behavior on the same 
first-party site across time. 


3. Web Compatibility: not effect, or minimally impact, 
user-serving, non-privacy harming behavior in third- 
party frames. 


In this work we aim to improve web privacy by presenting a 
new method of managing and limiting third-party state that we 
call “page-length storage”. Section 3.1 presents the approach 
in detail, but at a high level, “page-length storage” is the 
unique combination of two features: 


1. page-length storage partitions third-party state by the 
top level document. If a browser tab has loaded a page 
from origin A, and that page includes two sub-documents 
(i.e., <iframe>s) from origin B, the two sub-documents 
see the same storage, but different storage than B sees 
when B is the top-level document, and also different 
storage than origin B sub-documents on other pages and 
tabs. 


2. page-length storage sets the lifetime of all third-parties 
state to be equal to the lifetime of the top level document. 
If a page from origin A opens and closes <iframe>s 
from origin B, all of those origin B frames see the same 
storage, even between frames being opened and closed. 
However, once the top-level page is closed, all the par- 
titioned storage for B is cleared as well. Revisiting or 
reloading the top-level page will result in the B frames 
seeing empty storage. 


Contributions. More concretely, this work makes the fol- 
lowing contributions to improving privacy on the web. 


4https: //developer.mozilla.org/en-US/docs/Web/API/ 
Storage_Access_API 


1. The design of page-length storage, a novel approach to 
managing third-party state in web pages that provides 
strong privacy protections without breaking websites. 


2. New, general metrics for measuring the privacy and 
web-compatibility properties of third-party storage poli- 
cies. 


3. An open-source, prototype implementation of page- 
length storage as a set of patches to Chromium [13]. 


4. A public dataset of applying four storage policies to 
the Tranco 1k [43], a research-focused ranking of pop- 
ular sites. Our dataset [13] includes the above privacy 
and compatibility metrics generated from four policies, 
each approximating a third-party storage policy currently 
deployed in a popular browser. 


2 Background & Motivation 


Modern browser technologies and the security policies that 
govern them are complex, so we must clearly define our terms 
and provide essential background on browser storage policy, 
user tracking techniques, and the state of the art in tracking 
countermeasures. 


2.1 Same-Origin Policy & Storage Basics 


Sites & Origins. Browsers isolate storage (e.g., cookies, 
localStorage, indexDB) according to the Same-Origin Pol- 
icy (SOP) [11]. The SOP has grown complex, multifaceted, 
and inconsistent [46], and applies to many aspects of the web; 
here we describe only its most basic and universal elements, 
particularly as they relate to storage. An origin comprises 
a scheme (e.g., https), a complete DNS hostname, and an 
optional TCP port number. All state-impacting activities in 
a browser are associated with an origin derived from some 
relevant URL. For example, a script’s execution origin is de- 
rived from the URL of the frame in which the script executes, 
and an HTTP request origin is derived from the URL being 
fetched. 

Many activities are restricted to same-origin boundaries. 
For example, a script executing in origin A cannot access 
cookies stored for origin B. This is true even when a sub- 
document from origin B is embedded in a page from origin A. 
Storage is strictly isolated according to SOP: scripts can ac- 
cess cookies and DOM storage (e.g., localStorage) only for 
their execution origin, and HTTP requests store and transmit 
cookies only for their destination origin. 


First and Third Parties. We now define two terms used 
through the rest of this paper, first-party and third-party. These 
terms are not unique to this work, but are frequently used to 
mean similar but not-quite-the-same things in research and 
web standards, so we define their use in this work explicitly. 


(a) Traditional/Permissive 


A 


T T 


Pages/Frames 


(b) Blocking 3rd Party Storage 


A 


T T 


Pages/Frames 


(c) Partitioning 3rd Party Storage by Site 


Pages/Frames 


(d) Page-Length 3rd Party Storage 


Pages/Frames 


Figure 1: Third-party storage (a) fully allowed, (b) fully blocked, (c) partitioned by first-party context, and (d) scoped to hosting 
page life time (our proposal). A, B, & T are distinct domains; T is embedded as a third-party within A & B. 


When loading a website, the first-party is the “site” por- 
tion of the top level document. This is the eTLD+1 of the URL 
shown in the navigation bar of the browser. Any sub-resources 
or sub-documents included in the page are considered first- 
party if they’re fetched from the same eTLD+1 as the top 
level document. Third-parties, then, are any site not equal 
to the top-level document. A sub-document (i.e., <iframe>) 
is considered third-party if it is fetched from any origin not- 
equal to the top-level document, a script is third-party if it 
was fetched from a site different than the top level document, 
and so forth. 


Finally, we note that when applying SOP to the web, what 
determines storage access is the “site” of the frame including a 
script, not the script itself. So if a page from origin A includes 
a script loaded from origin B, the script is third-party, but has 
access to the first-party storage. What storage area a script 
has access to is determined by the “site” of the page, not the 
“site” of the script. 


2.2 User Tracking 


Types of Behaviors Tracked. This work uses the term 
“tracking” to refer to a third-party re-identifying a visitor 
across visits to first-party sites. Unless otherwise specified, 
we use “tracking” to refer both to cross-site tracking (i.e., 
a third-party can link an individual’s behavior across first- 
parties) and cross-time tracking (i.e., a third-party can iden- 
tify the same person returning to the same first-party across 
sessions). 


Stateful Tracking. The oldest, simplest and most common 
form of online tracking is “stateful” tracking, where a third- 
party stores a unique value on the user’s browser, and reads 
that value back across different first-parties. While the terms 
explicit and inferred used by Roesner et al. [45] appear 
more precise, as both techniques involve state of some kind, 
the stateful/stateless terminology popularized by Mayer and 
Mitchell [35] appears dominant in subsequent research. 

In the simplest case, stateful tracking works as follows. 
Sites A and B both include an iframe embedding site C. When 
the embedded site C is loaded, it looks to see if a unique 
identifier has been set, and if not it generates and stores one, 


using any of the storage methods provided by the browser 
(cookies, localStorage, indexDB, etc). Embedded site C 
then checks to see what site’s its being embedded in, and 
sends a message back to site C, recording that the same user 
visited both sites A and B. 

Stateful tracking, at root, relies on a site being able to access 
persistent state in different contexts, and using the persistently 
stored state to link (conceptually) unrelated behavior. To build 
on the previous example, site C is able to track the user across 
A and B because C seems the same storage values when 
embedded in sites A and B, even though site A might not 
have any direct relationship with site B. 

As we will discuss at length later, approaches for prevent- 
ing stateful tracking involve either preventing third-parties 
from storing values, proving third-party different stored val- 
ues when embedded in different contexts, or combinations of 
the two. 


Other Tracking Techniques. While stateful tracking is the 
simplest, and likely the most common, method for tracking 
users online, there are other ways third-parties track users. 
While this work focuses on stateful tracking, in this subsection 
we briefly discuss these other, non-stateful techniques here 
for completeness: 


e Browser fingerprinting refers to uniquely identifying a 
browser (or browser user) not through the storage and 
transmission of a unique identifier, but by identifying 
unique characteristics of the browser’s configuration 
(e.g., plugins, preferred language, “dark mode”) and ex- 
ecution environment (e.g., operating system, hardware 
capabilities). 


e Server-side tracking is a broad term that loosely means 
tracking users across sites not through stored identifiers 
(i.e., stateful tracking) or unique configuration (i.e., fin- 
gerprinting), but through information the user provides 
to the site. For example if a user uses the same email 
address when registering on two different sites, a tracker 
could later use the repeated email address to link the 
users behavior across sites. 


2.2.1 Focus on Stateful Tracking 


This work presents a novel solution for preventing stateful 
cross-site and cross-time tracking. We aim to improve pro- 
tections against stateful tracking because we think its where 
browsers are most lacking practical, robust, comptable de- 
fenses. While significant research has gone into building web- 
compatible defenses against stateless tracking (e.g., [31,37]), 
the existing techniques for preventing stateful third-party 
tracking are either incomplete (i.e., they still allow signifi- 
cant privacy harm to occur) or incompatible (i.e., they break 
a significant number of websites). 


2.3 Deployed Stateful Tracking Defenses 


Real-world countermeasures currently deployed in produc- 
tion browsers illustrate a range of possible tradeoffs between 
privacy and compatibility. We rely heavily on the community- 
curated Cookie Status project [3] for up to date policy imple- 
mentation details. 


Brave: Block all Third-Party State. Brave [16], a 
Chromium fork featuring aggressive privacy protections 
called “Shields”, defaults to blocking all forms of third- 
party storage. The officially correct way to block persistent 
third-party storage involves raising a JavaScript exception 
on script access to blocked storage APIs [10]. Few sites 
are prepared to handle these exceptions, however, and Brave 
improves compatibility by instead simply turning blocked 
third-party storage API accesses into no-ops. Brave also uses 
a whitelist to allow a small number of high-profile third- 
parties to use persistent storage in the context of specific 
first-party sites (e.g., googleusercontent.com when embed- 
ded from google.com) [7]. Brave’s approach results in strong 
privacy protections at the cost of a higher incidence of site 
breakage, which may require users to selectively lower its 
Shields on incompatible sites. 


Safari: Partition Third-Party State. Apple Safari [15] fea- 
tures Intelligent Tracking Prevention (ITP), a combination of 
storage restriction policies, opt-in APIs, and on-client classi- 
fication of tracking domains via machine-learning [12]. Sa- 
fari never transmits cookies on third-party HTTP requests. 
Cookie and localStorage access in third-party frames are 
partitioned on first-party site identity to prevent stateful lateral 
tracking (as in Figure 1c). Safari provides developers with 
a requestStorageAccess API to request user permission 
to access unpartitioned third-party storage across first-party 
contexts. This opt-in approach allows users to accept the po- 
tential for lateral tracking in exchange for useful functionality 
such as cross-site login state. 

Safari features a number of additional policies and heuris- 
tics to restrict the lifetime of items stored by domains ITP 
has classified as probable trackers. These restrictions im- 
pact but do not categorically eliminate potential for longi- 
tudinal tracking by third-parties. The Safari ITP approach 
provides strong cross-site tracking protections while avoiding 
full-blocking with its associated site breakage, but it does 
not eliminate across-time tracking by third-parties, and the 
non-deterministic impact of machine-learning on its policy 
enforcement can make it challenging for web developers to 
reason about. 


Firefox and Edge: Restrict Known Bad Actors. Mozilla 
Firefox [6] has adopted a selective storage policy that depends 
on the Disconnect [8] list of curated tracking domains. In 
general, third-party origins not found in the Disconnect list are 
granted unrestricted access third-party storage. Third-party 
origins classified as trackers by Disconnect are given access to 
third-party storage on the first five first-party sites embedding 


that third-party origin. Subsequent additional sites embedding 
that third-party origin will result in user opt-in prompts to 
allow third-party storage which must be accepted to allow use 
of third-party storage by that origin on that first-party site. 

Exceptions to these restrictions are made for first-party 
domains identified by the Disconnect list as related to specific 
third-party origins (e.g., googleusercontent.com embedded 
on google.com). The Firefox approach is one of compromise: 
well-known tracking domains face restrictions on the reach 
of their lateral tracking, but protection depends heavily on the 
validity and coverage of the underlying filter list. 

Microsoft Edge [4] has begun to deploy filter list-based stor- 
age restrictions similar to those performed by Firefox, with 
all the benefits and drawbacks of this compromise approach 
summarized above. 


Chrome: Unrestricted Third-Party State. Google 
Chrome [9], in contrast to all of the above, permits full third- 
party storage use, including sending cookies on HTTP re- 
quests to third-party resources. Google has announced inten- 
tions to phase out third-party cookie support [2] in the near 
future; technical details remain vague, but their wording im- 
plies eliminating only cookies on third-party HTTP requests, 
not restricting third-party storage in general. Chrome domi- 
nates as the world’s most popular browser for both desktop 
and mobile markets [1], understandably prompting web devel- 
opers to target its behavior for maximum compatibility, and 
indirectly perpetuating the status quo of stateful tracking in 
the process. 


2.4 Compatibility and Tracking Protections 


Finally, we present some ways that existing protections 
against third-party stateful tracking break websites. We 
present these as moderating examples, and useful constraints, 
in designing page-length storage. Without considering these 
compatibility concerns, solutions will tend to simplistic, 
“block-everything” approaches that end up not being useful, 
and so not being effective in protecting privacy. 

We gather the following examples from Brave’s public is- 
sue tracker’. We pull from Brave’s breakage reports since 
Brave has the most aggressive restrictions on third-party 
storage of the surveyed browsers, and so the largest num- 
ber of storage-related compatibility problems. Nevertheless, 
we present these as examples of the kinds of compatibility 
problems that third-party storage protections can cause. 


2.4.1 Uncaught Exceptions from Blocking Storage 


Strict third-party storage blocking breaks embed- 
ded SlideShare slide show widgets (e.g., https: 
// support .blogactiv.eu/2015/04/24/how-to-embed- 
slideshare/) on Chrome. The widget becomes inert, not 


Shttps://github.com/brave/brave-browser/issues 


Figure 2: Stock market graph broken by strict third-party 
storage blocking (left) and working with page-length storage 


(right). 


responding to clicks, when Chrome’s implementation of strict 
third-party storage blocking (correctly, per the specification) 
raises JavaScript exceptions on access to storage APIs. 
Brave’s silent no-op implementation of strict third-party 
storage blocking is sufficient to prevent breakage in this 
case; successful storage access is clearly not essential to this 
widget’s functionality. 

A similar example is provided by a data plot widget 
broken by strict third-party storage blocking (e.g., https: 
//www.otcmarkets.com/stock/NSRGY/overview). Once 
again, strict third-party storage blocking causes a JavaScript 
run-time error which results in a blank data plot (see Figure 2). 
In this case, Brave’s silent no-op blocking does not help: the 
error is caused by property access on a null value returned 
from a no-op API stub. 


2.4.2 Breaking Cookie-Based Third-Party Sessions 


A server-side example of strict third-party storage blocking 
causing breakage is provided by a live code editing/running 
widget embedded in the R language documentation (e.g., 
https://www.rdocumentation.org/packages/grid/ 
versions/3.6.2/topics/grid.plot.and.legend). The 
embedded widget tries to establish a cookie-based session 
with third-party domain multiplexer-prod.datacamp.com. 
Failure to persist third-party cookies results in HTTP 403 
errors on subsequent HTTP requests, preventing code 
execution and output display. 

A broken video player on a popular commen- 
tary and analysis site provides another example 
(https: //fivethirtyeight.com/videos/do-you-buy- 
that-biden-should-pick-a-running-mate-from-a- 
swing-state/). With third-party storage blocked, the video 
player remains blank indefinitely. In this case, the video 
player functionality is broken because the frame attempts to 
use localStorage to persist values across pages. 


3 Design & Implementation 


We propose and prototype a novel browser storage policy that 
prevents both cross-site and cross-time stateful tracking while 
measurably improving site compatibility over traditional third- 
party storage blocking. Page-length storage prevents stateful 
third-party tracking while minimizing site breakage by mak- 
ing third-party storage fully functional within a strictly iso- 
lated, ephemeral scope. We also provide a brief overview of 
how we developed and tested our page-length storage proto- 
type within a Chromium-based browser. 


3.1 Policy Design 


The key insight behind page-length storage is that site break- 
age can be minimized without compromising privacy by mak- 
ing all interactions with third-party storage behave normally 
(i.e., permissively), but only within the isolated, ephemeral 
scope of the containing page’s lifespan. The containing page 
is the top-level frame, loaded from the URL displayed in the 
browser’s navigation UI (e.g., address bar). Its lifespan ex- 
pands from the moment the top-level frame committed to 
loading that document URL to the moment any navigation 
event (including even reloads of the same URL) discards the 
contents of the top-level frame. 

Within the isolated, ephemeral scope of each top-level 
page object’s lifespan, third-party storage access behaves nor- 
mally for both scripts executing in third-party <iframe>s and 
HTTP requests to third-party domains. SOP enforcement is 
unchanged. All storage behaves in the traditional, permissive 
way with one exception: third-party storage starts out empty 
and is discarded along with the top-level page object on top- 
frame navigation. Any site embedding third-party content 
which functions correctly under permissive policy for first- 
time visitors with empty cookie jars should function correctly 
under page-length storage policy. 

Isolating third-party storage to single page lifespans pro- 
vides a good tracking vs. compatibility compromise. Page- 
length storage prevents stateful cross-site and cross-time track- 
ing automatically, as third-parties cannot “remember” any- 
thing past top-level page (re-)loads. Compare Figures la and 
ld. As a practical matter, third-party content cannot silently 
manipulate top-level page navigation, so it cannot test whether 
third-party storage will persist beyond top-level navigations. 
All tests that can be done silently within the scope of a single 
page’s lifespan will appear fully functional, as for a first-time 
visitor with uninitialized third-party storage. 

Some hypothetical examples illustrate the impact of this 
approach. 

First, consider two <iframe>s from the same third-party 
embedded on a single page document: these will share the 
same ephemeral third-party storage partition and so can use all 
forms of third-party storage, via both script access and HTTP 
cookies, to communicate with each other and the remote third- 


party origin for the duration of the embedding page’s lifespan. 

Second, consider a third-party <iframe> embedded on a 
page that is loaded on two different tabs simultaneously: each 
instance of the frame is using its parent-page’s ephemeral 
third-party storage partition, so no cross-site stateful shar- 
ing/communicating is possible between them. 

Third, consider a third-party <iframe> embedded on a 
page that is loaded and then reloaded in the same tab: each 
page load (regardless of URL) discards the previous page’s 
ephemeral third-party storage partition, so no cross-time state- 
ful sharing/communicating is possible. 

Finally, consider a third-party <iframe> embedded in two 
pages hosted on different first-party domains: whether these 
pages are loaded sequentially in one tab, or simultaneously in 
two tabs, each third-party frame is using its own parent-page’s 
ephemeral third-party storage partition, so again no cross-site 
stateful sharing/communicating is possible. 


3.2 Prototype Implementation 


We implement our page-length storage prototype as a set 
of patches to Brave 1.12.48 (based on Chromium 83). We 
use Brave, and version 1.12.48 specifically, in order to use 
the latest revision of PageGraph for data collection, per Sec- 
tion 4.1.4. However, our patches are completely independent 
of PageGraph’s patches and can be built without them present. 
The most relevant change from stock Chromium provided by 
Brave is its gentler approach to third-party storage blocking, 
which it enables by default. Instead of raising a JavaScript 
exception on script access to blocked storage (per the speci- 
fication), Brave makes the access a silent no-op, returning a 
null value. 

Chromium’s standard architecture includes content and 
storage isolation mechanisms relevant to our design goals. In 
addition to classic SOP enforcement, Chrome isolates con- 
tent rendering and JavaScript execution into separate render 
processes partitioned on a same-site basis (see Section 2.1). 
Each render process uses one storage partition, which can 
be persistent (the default) or ephemeral (private mode), and 
which can additionally be partitioned by arbitrary identifiers 
(for Chrome apps and extensions). HTTP traffic is managed 
by a dedicating networking process, which chooses a storage 
partition for HTTP cookies based on the frame initiating the 
request. 

We exploit this existing site storage isolation framework to 
prototype page-length storage with minimal changes to the 
browser. Our classification of frames and requests as third- 
party reuses the same-site logic already in Chromium and 
is always relative to the top-level page URL (not <iframe> 
URLs). Each time a tab’s top-level frame loads a page URL, 
we generate and store a UUID identifying that load event (the 
load key). When third-party frames are subsequently created 
and assigned to separate render processes, we augment the 
third-party site identifier with the top-level frame’s current 


load key to enforce page-level isolation between ephemeral 
third-party storage partitions. HTTP requests to third-parties 
are bound to the associated ephemeral third-party storage 
partition, which is created on demand if necessary. The result 
is unchanged first-party storage behavior, and fully functional 
third-party storage that lives only as long as the containing 
page document, as in Figure ld. 


3.3 Implementation Remarks 


The changes required to prototype page-length storage proved 
deceptively small. A total of 276 lines of C++ were added, 
changed, or removed by our patches. The scale of these 
patches, small relative to the millions of source code lines 
of Chromium, belie the challenge of finding the right places 
to patch. Most changes relating to storage partition creation 
and isolation were confined to the main “browser” process 
in Chromium’s multi-process architecture, which has access 
to the entire frame tree for each tab, and were thus relatively 
straightforward to implement. Binding third-party HTTP re- 
quests to isolated, ephemeral storage partitions on demand, 
however, crossed process boundaries into the network process, 
which does not have access to frame tree context information, 
and required additional IPC messaging and timing concerns. 

The implementation demonstrated correctness and ro- 
bustness fully sufficient for prototype testing. The avail- 
able Chromium unit tests all passed, except for a few 
implementation-specific assertions we expected to fail af- 
ter our changes. Manual testing of multiple scenarios like 
the examples from Section 3.1, included nested third-party 
<iframe>s, showed expected behavior in all cases. Further- 
more, the prototype’s error rate during automated crawls were 
favorably comparable to stock permissive policy (see Sec- 
tion 5.1). 

Prototype performance proved adequate despite not being 
a design priority. Because performance was not a design pri- 
ority for the prototype, we did not perform any benchmarks. 
In theory, our approach should reduce performance over stock 
Chromium: it can produce more render processes, can in- 
volve more I/O operations creating temporary directories, 
and definitely invokes additional IPC overhead between net- 
work and render processes. However, both manual testing and 
automated crawling with the prototype revealed no obvious 
performance degradation. Furthermore, none of these issues 
are inherent in the policy itself, and there is no reason to be- 
lieve a performance-tuned production implementation would 
produce any significant overhead compared to traditional poli- 
cies. 


4 Methodology 


We evaluate our proposed policy by comparing its tracking 
and compatibility performance against alternative policies 
during automated, stateful crawls of popular web sites. 


4.1 Crawl Methodology 


Here we describe our data collection procedure in sufficient 
detail to permit straightforward experiment reproduction. 


4.1.1 Target URLs 


We generated a seed list of URLs to visit in parallel using a 
stateless pilot crawl of the Tranco 1k sites [43]. To achieve 
depth and representative sampling of web content, we must 
explore more than just the “landing page” of each site. But 
each of our 8 parallel crawls must visit the same sequence 
of page URLs to produce comparable results. Coordinating 
the link spidering and selection process across parallel crawls 
introduces needless engineering complexity. Our solution was 
to perform a stateless pilot crawl using stock Brave to visit the 
Tranco Ik sites’ landing pages and spider three links deep into 
the site structure. This approach, using Tranco list snapshot 
JZZY, produced 3,419 total deduplicated page URLs to visit. 


4.1.2 Policy Variants 


We collect data using four distinct policy variants: 


Permissive: Allows all forms of third-party storage, as 
per Figure la. Stock Chrome behavior. Presumed to 
cause no breakage. 


Strict third-party storage blocking: Blocks all forms 
of third-party storage, as per Figure |b. Treats access as 
no-op. Presumed to cause the most breakage. 


Site-keyed: Partitions persistent third-party storage by 
first-party eTLD+1, as per Figure 1c. Alternative to our 
proposed policy, inspired by elements of Safari ITP. 


Page-length: Isolates third-party storage in ephemeral, 
per-page partitions, as per Figure 1d. Our proposed pol- 
icy. 


4.1.3 Crawl Execution 


We executed our stateful crawls in parallel across all stor- 
age policies without any simulated user interactions. We 
deployed two instances of each tested policy to verify be- 
havioral consistency and provide similarity-score baselines 
(see Section 4.2.3). The crawlers maintained independent, 
persistent user profiles for each policy instance to maintain 
realistic state across all sequential page visits. The full exper- 
iment included 2 iterations crawling the master URL list to 
provide data on cross-time tracking across repeat visits. All 
crawls were performed in parallel and simultaneously (but 
without active synchronization between profiles) from a sin- 
gle network vantage point. Each page visit was performed in 
a freshly launched, non-headless (i.e., rendering to the Xvfb 
headless display server) browser instance. Navigation was 


allowed to time out after 30 seconds. Assuming no naviga- 
tion timeout, our crawlers waited for 30 seconds after the 
DOMcontent loaded event (i.e., main document fetched and 
parsed but subresources not fully loaded yet) before tearing 
down the browser instance. No simulated user interactions 
were attempted. 


4.1.4 PageGraph Instrumentation 


We use PageGraph, an instrumentation system built into an ex- 
perimental branch of Brave, to record internal page behaviors. 
PageGraph patches the V8 JS engine and the Blink HTML ren- 
dering engine to capture and annotate a graph of each HTML 
document’s DOM structure and the events that constructed 
and modified it. Nodes represent entities such as DOM el- 
ements, scripts, HTTP resources, storage mechanisms, and 
a selective subset of builtin and DOM-provided JavaScript 
APIs. Edges represent relationships between nodes such as 
DOM structures and script interactions with DOM elements, 
DOM events, JavaScript APIs, and HTTP requests. The set 
of non-structural edges in each of these graphs constitute the 
dynamic behaviors of the originating page. Behavioral-edge- 
set similarity can be quantified using Jaccard index scores to 
provide a useful proxy for behavioral compatibility among 
compared storage policies. 


4.2 Evaluation Methodology 


We evaluate our proposed policy’s privacy and compatibil- 
ity performance using full-scale quantitative stateful tracking 
metrics, full-scale quantitative site behavior similarity metrics, 
and randomly-sampled qualitative assessment of site break- 
age. 


4.2.1 Preliminary Data Filtering 


We focus our analysis on frames of interest: i.e., third-party 
frames not flagged as advertisements. Our classification of 
third-party vs. first-party frames is based on eTLD+1 matches 
derived from the Public Suffix List [14]. Frames loaded from 
the same eTLD+1 as the main page URL are first-party 
frames; all others are third-party frames. We eliminate from 
consideration all first-party frames and third-party frames 
flagged as advertising content by the community-maintained 
EasyList [5]. This filtering eliminates noise from our evalua- 
tion: first-party storage is not affected by our policy change, 
and we are unconcerned about breakage of known advertising 
content. 


4.2.2 Quantitative Privacy Assessments 


Tracking Potential. The central metric we use to quantify 
potential for stateful cross-site and cross-time tracking by 
third-parties is the potentially identifying cookie flow (PICF). 


A cookie flow is the combination of an HTTP cookie and 
a third-party eTLD+1 receiving that cookie. We consider 
cookie flows potentially identifying when the cookie values 
meet a tunable minimum size threshold and are unique to 
a single browser profile during our stateful crawls. There 
are other forms of third-party storage available, and other 
channels by which identifying tokens can be transmitted to 
third-parties. But we use cookies as a representative measure 
of stateful tracking because they are unambiguous in structure, 
ubiquitous as tracking IDs, and essentially unrestricted by 
stock Chrome, our baseline. (Both our page-length storage 
and site-keyed implementations apply their storage policies 
to all forms of third-party storage, not just cookies.) 


Cross-Site Tracking. Identical PICFs seen across multiple 
distinct top-level sites visited represent potential for cross-site 
tracking by the associated third-party domain. We aggregate 
cross-site PICFs to count the total number of top-level sites 
across which each distinct third-party domain seen could have 
tracked our crawler profiles, giving us summary scores of 
“cross-site trackability” by which to compare all our storage 
policies. These scores can be visualized using cumulative sum 
curves, as shown in Section 5.2. 


Cross-Time Tracking. PICFs seen on a given top-level site 
across multiple pages/crawls represent potential for cross- 
time, or visit-to-visit, tracking by a given third-party domain. 
We aggregate cross-time PICFs to count the total number of 
third-party domains which could have tracked our crawler pro- 
files for each distinct top-level site domain visited, giving us 
summary scores of “cross-time trackability” by which to com- 
pare all our storage policies. These scores can be visualized 
using cumulative sum curves, as shown in Section 5.3. 


4.2.3 Quantitative Compatibility Assessment 


We assess site compatibility across storage policies using a 
quantifiable proxy measure: similarity of internal page be- 
haviors as reported by PageGraph. Our insight is to presume 
no storage-based breakage for permissive profiles and some 
unknown (but non-zero) amount of breakage on strict third- 
party storage blocking profiles. If alternative policy (e.g., 
page-length storage) profiles produce content behaviors more 
similar to the permissive baseline than do the strict third- 
party storage blocking profiles, then the alternate policy is 
less likely than strict third-party storage blocking to cause 
breakage. 

We model and compare content behaviors using the set 
of non-structural (i.e., action or event) edges in PageGraph 
representations of relevant frames. Similarity between edge 
sets can be measured using the Jaccard index: J (A,B) = oa 
Index scores range from 0 (no intersection) to 1 (equality). 
We consider the score undefined when both sets were empty. 

We compare content behaviors across identical frames 
loaded on identical pages across all tested policies. Frames 


and pages are identified and matched by full URL. The simi- 
larity score of the two permissive profiles provides the compat- 
ibility baseline: the presumed best-possible similarity score 
for that frame/page instance. The other profiles are each com- 
pared with a single permissive profile to provide similarity 
scores to compare against the baseline. The cumulative sum 
of all frame/page instance similarity scores for each profile 
can be visualized to show which policies track closest to the 
baseline across all visited pages (see Section 5.4). 

We optimized the set of PageGraph node types included in 
our behavioral sets to maximize the distance between strict 
third-party storage blocking policy scores and the permis- 
sive baseline score. Our intuition is that the baseline score 
provides a threshold of “reasonable” behavioral differences 
between two different instances of the same content loaded in 
different browsers at about the same time. The farther away 
from this baseline a policy scores, the greater the likelihood 
of unreasonable, or breaking, differences in behavior. 

We identified 11 PageGraph node types relevant to behav- 
ioral analysis, a set small enough to be amenable to brute force 
optimization across its power set. Optimization relied on a 
random sample of 100 frame/site instances extracted from a 
preliminary full-scale crawl dataset, whose unoptimized simi- 
larity curves matched those of the entire data set, indicating 
a representative sampling. On this data subset we tested the 
strict third-party storage blocking separation from the per- 
missive baseline for every subset of relevant PageGraph node 
types. The results confirmed our intuition that the least helpful 
node types were structural elements like HTML elements and 
DOM text blocks; less intuitively, they also showed that Page- 
Graph’s set of instrumented DOM manipulation JavaScript 
APIs was similarly unhelpful. The final optimial node type set 
comprised scripts and PageGraph’s selected JavaScript builtin 
APIs (e.g., date functions), HTTP resources, frame structures 
(DOM roots and frame-owning elements), and storage mech- 
anisms (cookie jars, local and session storage buckets). Only 
edges (i.e., behaviors) linking these node types are included 
in the behavior similarity results presented in Section 5.4. 


4.2.4 Qualitative Compatibility Assessment 


We augment our quantitative proxy assessment of site compat- 
ibility with blinded multi-grader manual analysis for breakage 
within a random sample of sites loading popular third-party 
content. Our methodology is heavily inspired by a similar 
experiment by Snyder et al. [48]. 

To select our set of URLs to test, we first identified the 
most popular third-party, non-ad-blocked frame URLs within 
our crawl dataset. We sorted these by the harmonic mean of 
the number of pages embedding that frame and the number 
of third-party cookies set for the frame’s eTLD+1. This met- 
ric is higher for frames which appear on a large number of 
sites and have access to a large number of cookies: prime 
candidates for testing third-party storage policy changes. We 


selected the top 10 frame URLs with distinct eTLD+1s, to 
have higher content diversity. We further filtered out frames 
which appeared only on non-English sites (e.g., frames from 
baidu.com and alicdn.com), and frames which did not have a 
content type of either HTML or JavaScript (e.g., frames from 
sharethis.com with a content type of image). 

For each of the 10 selected frame URLs, we randomly se- 
lected 5 candidate page URLs observed to embed that frame 
URL during our crawls, giving us 50 candidate URLs for man- 
ual analysis. Upon closer inspection of the frame contents, 
some frames did not have any real estate on the page and sim- 
ply contained JS script, which would interact or render with 
DOM elements elsewhere on the containing site. With this 
insight, we adopted a holistic approach to evaluate breakage 
rather than simply observing the behavior of one frame. 

We had two graders evaluate each of our candidate URLs 
for the policy variants in Section 4.1.2. The graders would 
visit a candidate URL first with a permissive profile, the 
Chrome default. This visit is our control visit for manual 
analysis. It was followed by a visit to the same URL with 
each of the site-keyed, page-length, and strict third-party stor- 
age blocking profiles. Every visit, including the control visit, 
was independent of all others, with a fresh browser profile 
to ensure no browsing state carried over between tests/visits. 
To keep our graders unbiased, subsequent visits to the candi- 
date URL after the control visit were randomly coded so the 
graders did not know which profile they were using. 

In our holistic approach to evaluation, each grader would 
visit the candidate URL with the control profile first. We in- 
structed each grader to perform as many interactive actions 
on the candidate site within one minute, which is the aver- 
age dwelling time for a typical web-user on a webpage [34]. 
Activities depended on category of site: on news portals, our 
graders would skim through, search for articles, watch embed- 
ded videos, click on ads, or try to sign-up for newsletters; on 
shopping sites, they would search for products, add products 
to the shopping cart, and initiate a checkout; on product sites, 
graders would either skim informational material, or try any 
video streams available, etc. Subsequently, the graders would 
visit the same URL with the 3 coded profiles, performing 
similar actions as during the control visit, observing any devi- 
ations from the control visit, and scoring their visit on a 1 to 
3 scoring scale. The graders gave a coded profile visit a score 
of 1 if the visit did not have any perceptible deviations from 
the control; 2 if there were some deviations from the control 
visit, but this did not hinder their visiting experience or the 
tasks the graders attempted on the site; and 3 if the visit had 
significant deviations from the control, preventing the graders 
from replicating their control visit activities. 

Given the highly subjective nature of the evaluation scheme, 
we carefully assess grader agreement. Our graders eval- 
uated the candidate URLs independently, unaware of the 
other grader’s scores. In our evaluation, our graders had 
a high agreement percentage ( 95.33%). We also com- 


mm successful 
aa L] 
Permissive 


non-crash error 
non-PG crash 
Mmmm PG crash 
aerae F 
sieren oOo OE i 
Horeng ooo OE i 


0% 


10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 


Figure 3: Crawl success rate varied modestly across policies 
but was always reasonably high. 


puted the Cohen’s Kappa inter-rater reliability statistic [27] 
as 0.69, showing statistically substantial agreement between 
our graders [36]. We present the results of our manual evalua- 
tion in Section 5.5. 


5 Results 


Our experimental results show that page-length storage com- 
bines best-case stateful tracking protection with near-best- 
case site compatibility. 


5.1 Crawl Statistics 


Our stateful web crawls ran from September 12-16 on a single 
Linux virtual machine (40 VCPUs, 100GiB RAM).Combined, 
the crawls visited 27,352 total pages using 8 user profiles and 
produced 280,219 PageGraph files (405 GB). 

Error rates were acceptable (Figure 3) if somewhat am- 
plified by PageGraph internal consistency assertion failures. 
PageGraph’s instrumentation is expansive and tracks complex 
interactions between JavaScript execution, DOM manipula- 
tion, and network traffic. Whenever unexpected corner cases 
(or bugs) prevent it from establishing unambiguous context for 
an event or activity, PageGraph logs the issue and terminates 
the browser rather than recording unreliable data. 


5.2 Privacy: Cross-Site Tracking Potential 


Page-length storage eliminates stateful cross-site tracking as 
effectively as does strict third-party storage blocking. See 
Figure 4. The cumulative sum curves show the aggregate 
counts of sites across which third-parties could track users un- 
der different policies, calculated using the tracking-potential 
heuristics described in Section 4.2.2. Page-length, site-keyed, 
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Figure 4: Of our tested policies, all but permissive essentially 
eliminated stateful cross-site tracking potential. 


and strict third-party storage blocking policies are roughly 
equal at preventing stateful cross-site tracking. This result is 
logical and unsurprising: if third-party storage is not available 
(or is partitioned by first-party site, or is strictly ephemeral), it 
cannot be used to pass identifying state across site boundaries. 


5.3 Privacy: Cross-Time Tracking Potential 


Page-length storage also eliminates stateful cross-time track- 
ing as effectively as does full third-party storage blocking, 
which is a significant improvement over site-keyed storage. 
See Figure 5. These curves show the cumulative sums of third- 
parties which could longitudinally track return visitors across 
the Tranco 1k sites, as described in Section 4.2.2. Unsurpris- 
ingly, permissive policy allows the most cross-time tracking; 
its strong cross-site tracking ability implies cross-time track- 
ing ability. Persistent third-party storage, even if partitioned 
by first-party site context, is still accessible on repeat visits, 
allowing cross-time tracking. Thus, page-length and strict 
third-party storage blocking policies equally provide stronger 
cross-time tracking protection than site-keyed policy can. 


5.4 Compatibility: Quantitative Assessment 


Page-length storage produces page behaviors much closer 
to the permissive policy baseline than does full third-party 
storage blocking, as shown in Figure 6. These curves show 
cumulative sums of similarity scores between one of our per- 
missive crawl profiles and all other profiles, normalized to 
show 1.0 as the maximum possible score (perfect similarity 
on all instances). The curve showing the similarity scores 
between the two permissive profiles provides a baseline (i.e., 
the best scores observed). Note the high consistency between 
all pairs of same-policy profiles. While even the baseline falls 
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Figure 5: Our page-length policy significantly outperforms 
both permissive and site-keyed policies at reducing cross-time 
tracking potential. 


short of perfect similarity, there is a clear signal in the group- 
ing of policies. The strict third-party storage blocking policies 
produced the curves farthest from the baseline, as expected, 
well isolated from all the other policies. The non-blocking 
policies (site-keyed and page-length) both produced curves 
much closer to the baseline than to strict third-party storage 
blocking. The stark separation of curves strongly suggests 
that the non-blocking policies induce significantly less overall 
deviation from “normal” behavior (and thus less breakage) 
than does strict third-party storage blocking. 


5.5 Compatibility: Qualitative Assessment 


As described in Section 4.2.4, we had two graders indepen- 
dently perform manual evaluation on our set of 50 candidate 
URLs for each of the three profiles: site-keyed, page-length, 
and strict third-party storage blocking to manually assess 
each policy’s potential for breaking sites. The graders inde- 
pendently evaluated each candidate site for each of the three 
profiles to find any deviations from our control profile, per- 
missive, the Chrome default. The graders gave each visit a 
score on a scale of 1 to 3, as detailed in Section 4.2.4. We 
conservatively considered deviation from the control visit as 
a form of breakage, resulting in a score of greater than 1. We 
summarize the instances of graded breakage for each pro- 
file in Table 1. Grader notes on several reported breakages 
for page-length and site-keyed suggest that at least some of 
those deviations involved render process crashes rather than 
actual content breakage, possibly due to obscure bugs in those 
prototypes. 

Considering the 5 breakages observed for the strict third- 
party storage blocking profile, the page-length profile either 
scored similar (2 cases) or improved (3 cases) in terms of raw 
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Figure 6: Our page-length policy produces page behaviors 
within third-party frames much closer to the permissive base- 
line than does the breakage-prone strict third-party storage 
blocking policy. 


Pages | % Broken 

Erone Broken (n=50) 
Site-keyed 4 8% 
Page-length 2 4% 
Third-party blocking 5 10% 


Table 1: Candidate URL breakage as assesses by holistic 
(whole-page) manual grading 


grader scores. In contrast to the site-keyed profile (4 break- 
ages), the page-length profile again had either scored equal (2 
cases) or better (2 cases). There were no cases where there 
was a breakage for the page-length profile with worse score 
compared to either of the strict third-party storage blocking 
or the site-keyed profiles. We concluded that the page-length 
profile performed reliably better than the strict third-party stor- 
age blocking profile, and that it performed as well or better 
than the site-keyed profiles. The observed rate of breakage for 
strict third-party storage blocking (10%) appears reasonable, 
and the roughly 2-to-1 advantage of page-length storage over 
strict third-party storage blocking observed in manual testing 
parallels a similar advantage in mean cumulative similarity 
score observed in quantitative analysis (Section 5.4). 


6 Discussion 


6.1 Limitations 


The principal design limitation of page-length storage is the 
fact that some useful third-party webcomponents may sim- 
ply require persistent, non-partitioned storage. We suspect 
that persistent storage for embedded third-party content is 


more a matter of user convenience than essential functionality 
(e.g., customizing an embedded video player when the user is 
logged into the third-party site hosting the video). In any case, 
page-length storage can and should be augmented in produc- 
tion with the request StorageAccess API to allow the user 
to opt-in to persistent storage for specific third-parties, either 
universally or on a specific first-party. 


Our quantitative assessments of tracking and compatibility 
are subject to the limitations and risks of automated web 
crawls. While the scale of our crawl is modest, we believe the 
Tranco 1k provides a realistic sample of popular, mainstream 
web content and thus meets our evaluation needs. Spidering 
3 links deep past landing pages likewise provides reasonable 
sampling of site content without exhausting our time and 
space budget, as PageGraph can generate large volumes of 
data per page. All our crawlers were stateful and non-headless, 
giving them a fair chance at evading the most trivial forms of 
bot detection. More sophisticated bot detection depending on 
“human” interactions with page content should treat all profiles 
identically (as bots; we performed no interaction simulations). 
We thus believe that whatever impact bot detection had on 
our crawlers, it would have affected all our profiles similarly 
and not significantly skewed our results. 


6.2 Next Steps 


Page-length storage can be further, better evaluated by real 
users by deploying it first to browsers serving privacy- 
conscious audiences. Production implementations will be 
somewhat more complex than our prototype (to address per- 
formance and maintainability concerns) but should require 
only modest investment by browser vendors. Production im- 
plementations should use the requestStorageAccess API 
to allow user opt-in to useful third-party storage access. 
Vendors could then deploy page-length storage to privacy- 
conscious users already blocking third-party storage and ob- 
serve the impact on their site breakage reports. 


Ultimately, page-length storage can be standardized to pro- 
vide a near-best-of-both-worlds solution to the problem of per- 
sistent third-party storage abuse. Legacy content that assumes 
third-party storage access can be largely accommodated to 
minimize site breakage, without privacy loss. Modernized 
content can use the request StorageAccess API to bypass 
page-length storage with user consent and gain controlled ac- 
cess to persistent third-party storage. The user wins: content 
that really needs third-party storage access to provide tangible 
benefit to the user can do so with the user’s explicit permis- 
sion, but the risk of permission denial and user alienation will 
motivate publishers of content providing less compelling user 
benefit (e.g., advertisers and trackers) to make do with less 
intrusive technologies. 
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7 Related Work 


Stateful User Tracking. Storage-based user tracking, usually 
called “stateful” tracking and traditionally involving cookies, 
has been extensively studied. Mayer and Mitchell’s seminal 
third-party web tracking study covered both stateful and state- 
less techniques and introduced the influential FourthParty web 
measurement framework [35]. A contemporary stateful track- 
ing measurement work by Roesner et al. defined alternate 
terms “explicit” and “inferred” for stateful and stateless tech- 
niques, respectively, while measuring stateful tracking exclu- 
sively [45]. Acar et al.’s classic, large-scale user tracking mea- 
surement study emphasized stateful tracking and hinted at the 
nascent problem of cookie syncing [17]. A large-scale evalu- 
ation of stateful third-party tracking by Li et al. focused on 
cookies, the most prevalent form observed, and used machine- 
learning to identify third-party cookie tracking on 46% of 
the Alexa 10k [33]. Engelhardt and Narayanan’s extremely 
large-scale user tracking measurement study included both 
stateful and stateless techniques, covered the entire Alexa 
Top Million sites, and introduced the widely used OpenWPM 
web measurement framework [22]. Yang and Yue recently ex- 
tended classic tracking measurement methodologies to mobile 
web clients and reported distinctive but analogous groups of 
tracking domains compared to traditional desktop web track- 
ing [50]. Despite the increasing sophistication of web track- 
ing and countermeasure technologies, Fouad et al.’s recent 
exploration of obscure pixel-trackers showed classic third- 
party cookie tracking to still be effective and prevalent in 
the wild [24]. Zimmeck et al. even found traditional state- 
ful tracking techniques to provide usable building blocks 
for cross-device tracking via linking together independent 
tracking sessions from different devices [52], a phenomenon 
conceptually similar to cookie syncing. 


Cookie Syncing & Other State Transfers. Third-parties can 
collude to share stored user tracking identifiers and expand 
their tracking scope via cookie syncing. Olejnik et al. per- 
formed the first major measurement of cookie syncing in the 
wild, reporting that up to 27% of a user’s browsing history 
could be leaked via cookie syncing [39]. Falahrastegar et al. 
measured distinctive personal identifiers and entities sharing 
them across the web, focusing on the groups engaged in shar- 
ing and how user behavior affects sharing [23]. Our procedure 
for selecting potentially identifying cookie flows shares some 
similarities with their selection of personal identifiers. Pa- 
padopoulos et al. identified cookie syncing as a major source 
of hidden costs to users imposed by digital advertising on- 
line [42]. Subsequent work documented the state of the art 
in cookie syncing, reinforcing the importance of third-party 
cookies to contemporary tracking [41]. 

Tracking identifiers can be passed across first-party do- 
mains using means other than stored state. Stopczynski et 
al. provide evidence that modern defenses like Safari ITP 
are effective but are being actively attacked and evaded, e.g., 


via abuse of HTTP redirects passing identifiers in modified 
URLs [49]. For the moment, these attacks appear to constitute 
efforts to reestablish traditional cookie tracking disrupted by 
ITP rather than the emergence of a new tracking paradigm. 


Browser Fingerprinting. A major category of web privacy 
research for the past decade has involved stateless tracking 
via fingerprinting. The Panoptoclick project’s seminal report 
on browser fingerprintability [21] popularized the threat as 
a potential tracking vector and launched a flurry of related 
research. Acar et al. measured fingerprinting in the wild and 
found it much more prevalent than commonly estimated at the 
time [18]. Olejnik et al. dissected the infamous and quickly 
deprecated Battery Status API as a particularly egregious 
source of fingerprinting entropy [38]. Laperdrix et al. identi- 
fied new fingerprinting vectors from emerging desktop and 
mobile web technologies, but also identified potential trends 
toward reduced fingerprinting threats [32]. The current threat 
status of fingerprinting remains ambiguous: Gomez et al. re- 
ported findings that Panoptoclick-style identification has been 
largely defeated in practice [25], but Pugliese et al. later pre- 
sented counter-arguments from data that such fingerprinting 
is still an effective threat [44]. 


Content Blocking. Published countermeasures against user 
tracking can be broadly categorized as either blocking 
tracking-related content (e.g., ads) before they enter the 
browser or changing browser implementations to mitigate 
unwanted effects from such content. As most ad and tracker 
blocking currently depends by filter lists, filter list improve- 
ments and alternatives are a frequent research topic. Gugel- 
mann ef al. used large-scale traffic analysis (15k users on 
a campus network) to train a machine classifier of privacy- 
invasive tracking services, compared it to popular filter lists, 
and presented it as a mechanism for updating these lists 
faster and more effectively than the current crowd-sourced 
model [26]. The PageGraph instrumentation system has 
been used to demonstrate the effectiveness and efficiency 
of ad blocking via machine-learning trained on page graph 
data [29], to improve filter lists for non-English-speaking pop- 
ulations [47], and to detect filter list evasions in the wild [20]. 
Hu et al. analyzed the interconnectedness (or “tangle factor”) 
of first-party sites embedding the same third-party tracking 
content using real-world browsing data from volunteers in 
order to assess ad blocker effectiveness and to drive automatic 
partitioning of first-party sites into isolated multi-account 
containers [28]. 


Browser Policies & Mechanisms. page-length storage be- 
longs to another category of tracking countermeasure re- 
search, which focuses on evaluating and enhancing built-in 
browser security policies. Hypothetical discussions of block- 
ing third-party storage, and of potential collusion by third- 
parties to work around it, predate the modern era of track- 
ing research [30]. Bauer et al. demonstrated practical formal 
browser security using a taint-analysis and data-flow policy 
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enforcement engine build into Chrome 32; the system could 
be used to enforce classic browser policies (SOP, CSP) or 
prototype new ones [19]. Pan et al. prototyped an full replace- 
ment of the traditional SOP with a hierarchy of nested security 
principals, each layer able only to increase, not decrease, re- 
strictions on tracking [40]. Fingerprinting countermeasures 
that involve injecting randomness into known or suspected 
entropy sources to disrupt stateless tracking include Privarica- 
tor [37] and FPRandom [31]. Yu et al. described an elegantly 
generalized approach to tracking prevention at the data flow 
level using k-Anonymity, deployed in the privacy-focused 
Cliqz browser [51]. Our approach to quantifying tracking 
potential is loosely inspired by this data flow approach to 
defining privacy. 


8 Conclusion 


Our work addresses the lose-lose dilemma presented to 
browser developers by third-party storage: maintain the sta- 
tus quo and enable mass user tracking, or block third-party 
storage and break a significant amount of the useful web. 
Practical experience suggested it was rare for third-party con- 
tent to actually need persistent storage to provide desirable 
functionality to the user. We exploited this insight to design 
page-length storage, and the results show that a win-win (or 
at least a win-nearly-always-win) solution is possible to the 
old lose-lose dilemma. We share our contributions with the 
browser research and development community: the concep- 
tual design of page-length storage, a novel solution to the 
third-party state management problem in browsers; our met- 
rics for comparing the privacy and compatibility impact of 
storage policy changes; our working prototype, made avail- 
able as open source patches to Chromium, along with our 
crawl dataset. 
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